Project 3#

How Economic Prosperity Shapes Global Life Expectancy 🌍#

Name: Zenia Clarissa Bhaswata Putri

UNI: zc2709

Introduction#

Objective: Investigate whether higher GDP per capita associated with better health outcomes by analyzing the relationship between GDP per capita and life expectancy for various countries over time, focusing on 2021, as the latest/updated datasets obtained.

Hypothesis: In 2021, countries with higher GDP per capita are expected to have higher life expectancy due to better healthcare access, nutrition, and standards of living.

Analysis Question: Is there a positive correlation between GDP per capita and life expectancy across countries in 2021, suggesting that economic prosperity leads to improved healthcare access, nutrition, and standards of living?

Datasets to be used: I am using the GDP Per Capita dataset from World Bank, and Life Expectancy Rates from WHO. The data is across 179 countries within year 2012 - 2021. However, for the final visualization and analysis I chose the year 2021 because it is the latest year for which both the GDP per capita and life expectancy datasets are available. It is like working with the freshest ingredients—you get the most accurate and up-to-date snapshot of global trends. Plus, using the same year for both datasets ensures consistency, making the analysis more reliable and meaningful.

Step 1: Load and Inspect Datasets#

  • Import necessary Python packages (pandas for data handling)

  • Load both datasets into separate dataframes and inspect the structure to understand columns and missing data

I began by importing the necessary packages – pandas and plotly. Pandas is used for data manipulation, while plotly is for creating visualizations.

import plotly.io as pio
pio.renderers.default = "vscode+jupyterlab+notebook_connected"
import pandas as pd
import plotly.express as px

Next, I read my main dataframes for economic prosperity (GDP_percapita_data.csv) into the notebook to start exploring the data. I displayed the first few rows of both datasets to verify that they had been read correctly and to understand their structure. After that, I applied some basic functions to check the contents of key columns and clean missing values. I also look for unique country code in the datasets to prepare the merging action with the WHO data for life expectancy rates data. This initial exploration and data cleaning step is crucial to ensure that the data is ready for analysis and visualization.

gdp_percapita = pd.read_csv("GDP_percapita_data.csv")
gdp_percapita.head()
Series Name Series Code Country Name Country Code 2008 [YR2008] 2012 [YR2012] 2013 [YR2013] 2014 [YR2014] 2015 [YR2015] 2016 [YR2016] 2017 [YR2017] 2018 [YR2018] 2019 [YR2019] 2020 [YR2020] 2021 [YR2021] 2022 [YR2022]
0 GDP per capita (current US$) NY.GDP.PCAP.CD Afghanistan AFG 382.5338072 653.4174749 638.733181 626.5129291 566.8811297 523.053012 526.140801 492.090631 497.7414313 512.055098 355.7778264 352.6037331
1 GDP per capita (current US$) NY.GDP.PCAP.CD Albania ALB 4370.539716 4247.631343 4413.063383 4578.633208 3952.803574 4124.05539 4531.032207 5287.660801 5396.214243 5343.037704 6377.203096 6810.114041
2 GDP per capita (current US$) NY.GDP.PCAP.CD Algeria DZA 5217.991822 6096.090015 6044.674903 6164.644699 4741.49977 4481.081962 4615.868744 4640.314145 4530.101745 3794.409524 4216.251285 5023.252932
3 GDP per capita (current US$) NY.GDP.PCAP.CD American Samoa ASM 10019.50225 11920.06109 12038.87159 12313.99736 13101.54182 13300.82461 12372.88478 13195.9359 13672.57666 15609.77722 16653.71378 19673.3901
4 GDP per capita (current US$) NY.GDP.PCAP.CD Andorra AND 53938.85213 44902.38077 44747.75386 45680.53499 38885.53032 39931.21698 40632.23155 42904.82846 41328.6005 37207.222 42066.49052 42350.69707
gdp_percapita.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 217 entries, 0 to 216
Data columns (total 16 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Series Name    217 non-null    object
 1   Series Code    217 non-null    object
 2   Country Name   217 non-null    object
 3   Country Code   217 non-null    object
 4   2008 [YR2008]  217 non-null    object
 5   2012 [YR2012]  217 non-null    object
 6   2013 [YR2013]  217 non-null    object
 7   2014 [YR2014]  217 non-null    object
 8   2015 [YR2015]  217 non-null    object
 9   2016 [YR2016]  217 non-null    object
 10  2017 [YR2017]  217 non-null    object
 11  2018 [YR2018]  217 non-null    object
 12  2019 [YR2019]  217 non-null    object
 13  2020 [YR2020]  217 non-null    object
 14  2021 [YR2021]  217 non-null    object
 15  2022 [YR2022]  217 non-null    object
dtypes: object(16)
memory usage: 27.3+ KB

To make sure that the GDP per Capita data using the ISO 3-digit code for country code, I examine by seeing the country code uniqueness:

gdp_percapita["Country Code"].unique()
array(['AFG', 'ALB', 'DZA', 'ASM', 'AND', 'AGO', 'ATG', 'ARG', 'ARM',
       'ABW', 'AUS', 'AUT', 'AZE', 'BHS', 'BHR', 'BGD', 'BRB', 'BLR',
       'BEL', 'BLZ', 'BEN', 'BMU', 'BTN', 'BOL', 'BIH', 'BWA', 'BRA',
       'VGB', 'BRN', 'BGR', 'BFA', 'BDI', 'CPV', 'KHM', 'CMR', 'CAN',
       'CYM', 'CAF', 'TCD', 'CHI', 'CHL', 'CHN', 'COL', 'COM', 'COD',
       'COG', 'CRI', 'CIV', 'HRV', 'CUB', 'CUW', 'CYP', 'CZE', 'DNK',
       'DJI', 'DMA', 'DOM', 'ECU', 'EGY', 'SLV', 'GNQ', 'ERI', 'EST',
       'ETH', 'FRO', 'FJI', 'FIN', 'FRA', 'PYF', 'GAB', 'GMB', 'GEO',
       'DEU', 'GHA', 'GIB', 'GRC', 'GRL', 'GRD', 'GUM', 'GTM', 'GIN',
       'GNB', 'GUY', 'HTI', 'HND', 'HKG', 'HUN', 'ISL', 'IND', 'IDN',
       'IRN', 'IRQ', 'IRL', 'IMN', 'ISR', 'ITA', 'JAM', 'JPN', 'JOR',
       'KAZ', 'KEN', 'KIR', 'PRK', 'KOR', 'XKX', 'KWT', 'KGZ', 'LAO',
       'LVA', 'LBN', 'LSO', 'LBR', 'LBY', 'LIE', 'LTU', 'LUX', 'MAC',
       'MKD', 'MDG', 'MWI', 'MYS', 'MDV', 'MLI', 'MLT', 'MHL', 'MRT',
       'MUS', 'MEX', 'FSM', 'MDA', 'MCO', 'MNG', 'MNE', 'MAR', 'MOZ',
       'MMR', 'NAM', 'NRU', 'NPL', 'NLD', 'NCL', 'NZL', 'NIC', 'NER',
       'NGA', 'MNP', 'NOR', 'OMN', 'PAK', 'PLW', 'PAN', 'PNG', 'PRY',
       'PER', 'PHL', 'POL', 'PRT', 'PRI', 'QAT', 'ROU', 'RUS', 'RWA',
       'WSM', 'SMR', 'STP', 'SAU', 'SEN', 'SRB', 'SYC', 'SLE', 'SGP',
       'SXM', 'SVK', 'SVN', 'SLB', 'SOM', 'ZAF', 'SSD', 'ESP', 'LKA',
       'KNA', 'LCA', 'MAF', 'VCT', 'SDN', 'SUR', 'SWZ', 'SWE', 'CHE',
       'SYR', 'TJK', 'TZA', 'THA', 'TLS', 'TGO', 'TON', 'TTO', 'TUN',
       'TUR', 'TKM', 'TCA', 'TUV', 'UGA', 'UKR', 'ARE', 'GBR', 'USA',
       'URY', 'UZB', 'VUT', 'VEN', 'VNM', 'VIR', 'PSE', 'YEM', 'ZMB',
       'ZWE'], dtype=object)

Since the GDP per capita data already has all the columns I need, there’s no point in removing anything. No need to overthink it—every column included seems to be relevant for answering the questions I’m looking into.

Next, I do the same things with the WHO data:

life_exp = pd.read_csv("WHO_data3.csv")
life_exp.head()
IndicatorCode Indicator ValueType ParentLocationCode ParentLocation Location type SpatialDimValueCode Location Period type Period ... FactValueUoM FactValueNumericLowPrefix FactValueNumericLow FactValueNumericHighPrefix FactValueNumericHigh Value FactValueTranslationID FactComments Language DateModified
0 WHOSIS_000001 Life expectancy at birth (years) text AFR Africa Country LSO Lesotho Year 2021 ... NaN NaN 50.49 NaN 52.57 51.5 [50.5-52.6] NaN NaN EN 2024-08-02T04:00:00.000Z
1 WHOSIS_000001 Life expectancy at birth (years) text AFR Africa Country CAF Central African Republic Year 2021 ... NaN NaN 51.06 NaN 53.36 52.3 [51.1-53.4] NaN NaN EN 2024-08-02T04:00:00.000Z
2 WHOSIS_000001 Life expectancy at birth (years) text EMR Eastern Mediterranean Country SOM Somalia Year 2021 ... NaN NaN 52.92 NaN 55.11 54.0 [52.9-55.1] NaN NaN EN 2024-08-02T04:00:00.000Z
3 WHOSIS_000001 Life expectancy at birth (years) text AFR Africa Country SWZ Eswatini Year 2021 ... NaN NaN 53.49 NaN 55.87 54.6 [53.5-55.9] NaN NaN EN 2024-08-02T04:00:00.000Z
4 WHOSIS_000001 Life expectancy at birth (years) text AFR Africa Country MOZ Mozambique Year 2021 ... NaN NaN 56.64 NaN 58.77 57.7 [56.6-58.8] NaN NaN EN 2024-08-02T04:00:00.000Z

5 rows × 34 columns

life_exp.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1850 entries, 0 to 1849
Data columns (total 34 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   IndicatorCode               1850 non-null   object 
 1   Indicator                   1850 non-null   object 
 2   ValueType                   1850 non-null   object 
 3   ParentLocationCode          1850 non-null   object 
 4   ParentLocation              1850 non-null   object 
 5   Location type               1850 non-null   object 
 6   SpatialDimValueCode         1850 non-null   object 
 7   Location                    1850 non-null   object 
 8   Period type                 1850 non-null   object 
 9   Period                      1850 non-null   int64  
 10  IsLatestYear                1850 non-null   bool   
 11  Dim1 type                   1850 non-null   object 
 12  Dim1                        1850 non-null   object 
 13  Dim1ValueCode               1850 non-null   object 
 14  Dim2 type                   0 non-null      float64
 15  Dim2                        0 non-null      float64
 16  Dim2ValueCode               0 non-null      float64
 17  Dim3 type                   0 non-null      float64
 18  Dim3                        0 non-null      float64
 19  Dim3ValueCode               0 non-null      float64
 20  DataSourceDimValueCode      0 non-null      float64
 21  DataSource                  0 non-null      float64
 22  FactValueNumericPrefix      0 non-null      float64
 23  FactValueNumeric            1850 non-null   float64
 24  FactValueUoM                0 non-null      float64
 25  FactValueNumericLowPrefix   0 non-null      float64
 26  FactValueNumericLow         1847 non-null   float64
 27  FactValueNumericHighPrefix  0 non-null      float64
 28  FactValueNumericHigh        1847 non-null   float64
 29  Value                       1850 non-null   object 
 30  FactValueTranslationID      0 non-null      float64
 31  FactComments                0 non-null      float64
 32  Language                    1850 non-null   object 
 33  DateModified                1850 non-null   object 
dtypes: bool(1), float64(17), int64(1), object(15)
memory usage: 478.9+ KB

Step 2: Data Cleaning & Pre-processing#

  • Handle missing or invalid data by filtering or imputing values

  • Reshaping the data format (columns/rows)

  • Standardize country names across datasets (using ISO 3-digit country code)

  • Ensure time periods align (e.g. in this case we want to look at data in 2021)

Cleaning and pre-processing GDP percapita data

Columns that will likely be used:

To prepare the GDP per capita dataset for analysis, I began by selecting the necessary columns and reshaping the data. The original format had years as separate columns for each country, so I transformed/reshaped it into a tidy format with each year as a separate row (following the Life Expectancy data format). Additionally, I cleaned the year columns to remove the [YRxxxx] format, leaving only the numeric year. These steps ensured the dataset was organized, consistent, and ready for analysis alongside the life expectancy data.

# Cleaning and preprocessing GDP data
# Selecting necessary columns and reshaping the data
gdp_cleaned = gdp_percapita.melt(
    id_vars=["Country Name", "Country Code"],
    value_vars=[col for col in gdp_percapita.columns if "[YR" in col],
    var_name="Year",
    value_name="GDP per Capita"
)

# Extract the year from column names
gdp_cleaned["Year"] = gdp_cleaned["Year"].str.extract(r"(\d{4})").astype(int)
gdp_cleaned["GDP per Capita"] = pd.to_numeric(gdp_cleaned["GDP per Capita"], errors="coerce")
# Displaying the cleaned World Bank dataframe
gdp_cleaned.head()
Country Name Country Code Year GDP per Capita
0 Afghanistan AFG 2008 382.533807
1 Albania ALB 2008 4370.539716
2 Algeria DZA 2008 5217.991822
3 American Samoa ASM 2008 10019.502250
4 Andorra AND 2008 53938.852130

I cleaned the year columns to remove the [YRxxxx] format, leaving only the numeric year.

# Cleaning the GDP per capita data to remove '[YRxxxx]' from year columns
gdp_cleaned.columns = gdp_cleaned.columns.str.replace(r'\[YR\d{4}\]', '', regex=True).str.strip()

gdp_cleaned.head()
Country Name Country Code Year GDP per Capita
0 Afghanistan AFG 2008 382.533807
1 Albania ALB 2008 4370.539716
2 Algeria DZA 2008 5217.991822
3 American Samoa ASM 2008 10019.502250
4 Andorra AND 2008 53938.852130

Cleaning and pre-processing Life Expectancy data

Columns that will likely be used:

After inspecting the contents of the WHO data, I identified several columns that were not relevant to the questions I am investigating. Columns such as ‘FactComments’, ‘FactValueTranslationID’, and others contained additional metadata that did not contribute to my analysis of GDP per capita and life expectancy. Since I only needed ‘SpatialDimValueCode’, ‘Period’, and ‘FactValueNumeric’ columns to link location, year, and life expectancy data, I decided to drop all other columns.

Additionally, I renamed the remaining columns to make the dataset more intuitive and visually appealing. For instance, I simplified ‘SpatialDimValueCode’ to ‘Country Code’, ‘Period’ to ‘Year’ and ‘FactValueNumeric’ to ‘Life Expectancy’. This step helped streamline the dataframe for analysis and visualization, ensuring it contained only the necessary and meaningful data.

# Re-assigning for this isolated environment
life_exp = pd.read_csv('WHO_data3.csv')

# Dropping unnecessary columns
columns_to_drop = [
    "IndicatorCode", "Indicator", "ValueType", "ParentLocation",
    "Location type", "ParentLocationCode", "Dim1 type", "Dim1",
    "Dim1ValueCode", "Dim2 type", "Dim2", "Dim2ValueCode", 
    "Dim3 type", "Dim3", "Dim3ValueCode", "DataSourceDimValueCode", 
    "DataSource", "FactValueNumericPrefix", "FactValueUoM", 
    "FactValueNumericLowPrefix", "FactValueNumericLow", 
    "FactValueNumericHighPrefix", "FactValueNumericHigh", 
    "Value", "FactValueTranslationID", "FactComments", 
    "Language", "DateModified"
]

# Dropping the columns
life_exp.drop(columns=columns_to_drop, axis=1, inplace=True)

# Renaming columns for clarity
life_exp.rename(columns={
    "SpatialDimValueCode": "Country Code",
    "Period": "Year",
    "FactValueNumeric": "Life Expectancy"
}, inplace=True)
# Storing the WHO cleaned dataframe
life_exp_cleaned = life_exp
# Displaying the WHO cleaned dataframe
life_exp_cleaned.head()
Country Code Location Period type Year IsLatestYear Life Expectancy
0 LSO Lesotho Year 2021 True 51.48
1 CAF Central African Republic Year 2021 True 52.31
2 SOM Somalia Year 2021 True 53.95
3 SWZ Eswatini Year 2021 True 54.59
4 MOZ Mozambique Year 2021 True 57.66

Filtering for the year 2021#

Life Expectancy Data

As mentioned previously, I chose the year 2021 for my analysis so I then filtered the Life Expectancy data for the year 2021

# Filtering the data for the year 2021
life_exp_2021 = life_exp_cleaned[life_exp_cleaned["Year"] == 2021]

And also sort the data in descending order based on the Life Expectancy. This will reorder the rows so that the countries with the highest life expectancy are at the top

# Sorting the data by Life Expectancy in descending order
life_exp_2021_sorted = life_exp_2021.sort_values(by="Life Expectancy", ascending=False)

life_exp_2021_sorted
Country Code Location Period type Year IsLatestYear Life Expectancy
184 JPN Japan Year 2021 True 84.46
183 SGP Singapore Year 2021 True 83.86
182 KOR Republic of Korea Year 2021 True 83.80
181 CHE Switzerland Year 2021 True 83.33
180 AUS Australia Year 2021 True 83.10
... ... ... ... ... ... ...
4 MOZ Mozambique Year 2021 True 57.66
3 SWZ Eswatini Year 2021 True 54.59
2 SOM Somalia Year 2021 True 53.95
1 CAF Central African Republic Year 2021 True 52.31
0 LSO Lesotho Year 2021 True 51.48

185 rows × 6 columns

I am now looking at the top rows that gives me information of list of countries with the highest life expectancy in 2021. It is easier to see which countries are leading the way in terms of health, and overall longevity, before I merge it with the GDP per Capita data to test the hypothesis.

GDP Per Capita Data

I then do the same with the cleaned GDP datasets: filter the year 2021, and sort the data in descending order based on the GDP per Capita. This will reorder the rows so that the countries with the highest GDP per Capita are at the top

# Filtering the data for the year 2021
gdp_cleaned_2021 = gdp_cleaned[gdp_cleaned["Year"] == 2021]

# Sorting the data by GDP per capita in descending order
gdp_cleaned_2021_sorted = gdp_cleaned_2021.sort_values(by="GDP per Capita", ascending=False)

gdp_cleaned_2021_sorted
Country Name Country Code Year GDP per Capita
2300 Monaco MCO 2021 235132.7842
2283 Liechtenstein LIE 2021 197504.5489
2285 Luxembourg LUX 2021 133711.7944
2191 Bermuda BMU 2021 114274.6220
2262 Ireland IRL 2021 102001.7982
... ... ... ... ...
2272 Korea, Dem. People's Rep. PRK 2021 NaN
2315 Northern Mariana Islands MNP 2021 NaN
2347 South Sudan SSD 2021 NaN
2380 Venezuela, RB VEN 2021 NaN
2384 Yemen, Rep. YEM 2021 NaN

217 rows × 4 columns

I am now looking at the top rows that gives me information of list of countries with the highest GDP per capita in 2021. It is easier to see which countries are leading the way in terms of economic strength before I merge it with the life expectancy data to test the hypothesis.

Step 3: Data Analysis#

Individual Data Analysis & Visualization#

GDP per Capita#

Next, I decided to create individual visualizations for the 2021 GDP per capita data and life expectancy data to analyze each dataset on its own before merging them. For the GDP data, I went with a choropleth map because I wanted to capture the spatial distribution of wealth across the globe. Seeing the data on a worldwide map helps highlight regional patterns and pinpoint clusters of high and low GDP, which is super useful for understanding economic disparities 🌍💸.

# Creating a choropleth map for GDP per Capita in 2021
fig = px.choropleth(
    gdp_cleaned_2021,
    locations="Country Code",  # ISO country codes
    color="GDP per Capita",  # Color scale based on GDP per Capita
    hover_name="Country Name",  # Hover shows country name
    title="Worldwide GDP per Capita in 2021",
    color_continuous_scale="Blues",  # Use a blue color scale for visual effect
    labels={"GDP per Capita": "GDP per Capita (USD)"}
)

# Enforcing the range for the color axis
fig.update_layout(
    coloraxis_colorbar=dict(title="GDP per Capita"),
    coloraxis=dict(cmin=5000, cmax=90000),  # Force color scale range
    geo=dict(
        showcoastlines=True,
        coastlinecolor="Black",
        showland=True,
        landcolor="LightGray",
        showcountries=True,
        countrycolor="Black",
        projection_type="natural earth",
    ),
    height=600,
    margin={"r": 0, "t": 50, "l": 0, "b": 0},
)

# Displaying the map
fig.show()

🌍The choropleth map of GDP per capita in 2021 showcases a stark global disparity in economic wealth. Darker shades, representing higher GDP per capita, are concentrated in North America, Western Europe, and select regions in Asia and Oceania. Countries such as the United States, Canada, Luxembourg, Norway, and Singapore stand out prominently, reflecting their strong economies and high living standards. It would likely reflect that these countries are a highly developed countries with access to world-class healthcare, education, and infrastructure. Additionally, high productive key sectors like mining, finance, and tourism play an important role. However, there are possible disparities in income distribution despite the high average figure.

In contrast, much of Sub-Saharan Africa and parts of South Asia are depicted in lighter shades, indicating significantly lower GDP per capita. This highlights persistent economic challenges in these regions, which often face issues like underdeveloped infrastructure, limited access to resources, and political instability.

Additionally, areas such as the Middle East, particularly resource-rich countries like Qatar and the UAE, likely demonstrate high GDP per capita, although their smaller geographical representation on the map makes it less visually striking.

Life Expectancy#

For life expectancy, I created a bar chart to get a clear view of the distribution by country. This approach makes it easier to compare countries directly and spot outliers or trends, like which nations are leading the charge in longevity and which are falling behind. Breaking it down this way allows for a deeper understanding of each dataset before combining them for the bigger picture! 📊✨

# Create a sorted bar chart for life expectancy by country within each region
fig_countries = px.bar(
    life_exp_2021,
    x="Location",  # Country names
    y="Life Expectancy",  # Life expectancy values
    color="Country Code",  # Group countries by region
    title="Life Expectancy Distribution by Country (2021)",
    labels={"Life Expectancy": "Life Expectancy (Years)", "Country Code": "Country"},
)

fig_countries.update_layout(
    xaxis_tickangle=45,  # Rotate x-axis labels for better visibility
    height=600
)

# Show the country-level distribution chart
fig_countries.show()

📊The bar chart provides a country-by-country breakdown of life expectancy in 2021, showcasing significant disparities across the globe. At the higher end of the spectrum, countries like Japan, Singapore, and Switzerland exhibit life expectancies above 80 years, reflecting advanced healthcare systems, robust social policies, and healthier lifestyles. These countries lead globally in longevity and are benchmarks for health and well-being.

In contrast, countries like Lesotho, the Central African Republic, and Somalia are positioned at the lower end, with life expectancies below 60 years. This highlights ongoing challenges such as limited healthcare access, malnutrition, and the impact of political instability and economic hardship.

The overall distribution emphasizes the stark divide between high-income nations and low-income nations regarding health outcomes. While many countries cluster around the global average of 70–75 years, the extremes highlight the crucial role of economic, social, and cultural factors in shaping life expectancy. This chart serves as a reminder of the pressing need to address inequalities in global health infrastructure and resources.

Merging the two Datasets#

Next, I went ahead and merged the GDP and life expectancy datasets using the Country Code columns as the keys. These are the most logical connections since they represent the location dimensions of the data, and already formatted in the same ISO 3-digit code from what I have done in the cleaning and pre-processing data step before.

To keep things clean and accurate, I explicitly specified the columns in the merge function. This step ensured that every country’s GDP per capita was perfectly aligned with its corresponding life expectancy for each year. The result? A unified dataset that brings together economic and health indicators—exactly what I need to dive deeper into the analysis. Let’s see what insights this powerhouse combo reveals! 🚀📊

final_merged_2021 = pd.merge(
    life_exp_2021,
    gdp_cleaned_2021,
    on=["Country Code", "Year"],
    how="inner"
)

final_merged_2021
Country Code Location Period type Year IsLatestYear Life Expectancy Country Name GDP per Capita
0 LSO Lesotho Year 2021 True 51.48 Lesotho 1054.932740
1 CAF Central African Republic Year 2021 True 52.31 Central African Republic 461.137511
2 SOM Somalia Year 2021 True 53.95 Somalia 576.523678
3 SWZ Eswatini Year 2021 True 54.59 Eswatini 4068.573790
4 MOZ Mozambique Year 2021 True 57.66 Mozambique 504.037759
... ... ... ... ... ... ... ... ...
180 AUS Australia Year 2021 True 83.10 Australia 60697.245440
181 CHE Switzerland Year 2021 True 83.33 Switzerland 93446.434450
182 KOR Republic of Korea Year 2021 True 83.80 Korea, Rep. 35125.522500
183 SGP Singapore Year 2021 True 83.86 Singapore 79601.412960
184 JPN Japan Year 2021 True 84.46 Japan 40058.537330

185 rows × 8 columns

Next I save the final_merged_2021 file into csv to my computer folder in case I need that for future analysis, and for a quick check for all the columns and rows before I make the merged visualization

final_merged_2021.to_csv("final_merged_2021.csv", index=False)

Step 4: Final Visualizations on the merged data#

  • Final Visualization:

    • Scatter Plots: With a scatter plot, it is easy to visually compare multiple countries and see where they fall on the spectrum of wealth and life expectancy, adding depth to the interpretation of the data

    • Choropleth Map: Storytelling with Geography –> The choropleth map helps add a geographical layer to the analysis, showing how wealth and life expectancy is distributed globally, and potentially highlighting areas that are outperforming or lagging behind their neighbors

The Scatter Plots#

# Creating a combined scatter plot to analyze GDP per capita vs Life Expectancy over time
fig = px.scatter(
    final_merged_2021,  # Using the merged dataset
    x="GDP per Capita",  # GDP per capita on x-axis
    y="Life Expectancy",  # Life expectancy on y-axis
    color="Country Code",  # Different colors for each country
    animation_frame="Year",  # Animation over time (year)
    hover_name="Country Name",  # Hover to show country names
    title="Relationship Between GDP per Capita and Life Expectancy in 2021",
    trendline="ols",
    labels={
        "GDP per Capita": "GDP per Capita (USD)",
        "Life Expectancy": "Life Expectancy (Years)",
        "Country Code": "Country",
    },
)

# Show the plot
fig.show()
# Correcting and rewriting the code for filtering the dataset by Country Code, Life Expectancy, and GDP per Capita
highest_gdplifeexp = final_merged_2021.filter(
    ["Country Code", "Life Expectancy", "GDP per Capita"], axis=1
)

# Resetting the index to make it cleaner
highest_gdplifeexp = highest_gdplifeexp.reset_index(drop=True)

# Displaying the first few rows of the filtered dataset
highest_gdplifeexp.head()
Country Code Life Expectancy GDP per Capita
0 LSO 51.48 1054.932740
1 CAF 52.31 461.137511
2 SOM 53.95 576.523678
3 SWZ 54.59 4068.573790
4 MOZ 57.66 504.037759

The Choropleth Map#

import plotly.express as px

# Creating a choropleth map to analyze the relationship between GDP per Capita and Life Expectancy over time
fig = px.choropleth(
    final_merged_2021,  # Using the merged dataset
    locations="Country Code",  # ISO country codes for mapping
    color="Life Expectancy",  # Color scale based on Life Expectancy
    hover_name="Country Name",  # Show country names on hover
    hover_data={"GDP per Capita": True, "Life Expectancy": True},  # Include GDP per Capita in hover info
    animation_frame="Year",  # Animation over time (year)
    title="Choropleth: Global Relationship Between GDP per Capita and Life Expectancy in 2021 🌍",
    labels={
        "Life Expectancy": "Life Expectancy (Years)",
        "GDP per Capita": "GDP per Capita (USD)",
        "Country Code": "Country",
    },
    color_continuous_scale="Viridis",  # A diverse color scale for better contrast
)

# Updating the layout for a clean and interactive display
fig.update_layout(
    height=600,
    margin={"r": 0, "t": 50, "l": 0, "b": 0},
    geo=dict(
        projection_type="natural earth",
        showcoastlines=True,
        coastlinecolor="Black",
        showland=True,
        landcolor="LightGray",
    ),
)

# Show the interactive map
fig.show()

Summarize key findings between two datasets: In 2021, does GDP per capita strongly correlate with life expectancy?

Step 5: 🌍 Let’s dive into the findings!#

From the Scatter Plot

The scatter plot visualizes the relationship between GDP per capita and life expectancy for countries in 2021, showing a clear positive correlation. As GDP per capita increases, life expectancy generally rises, though the relationship is not perfectly linear. Countries with low GDP per capita cluster in the lower-left, where life expectancy tends to be below 70 years. However, beyond a certain GDP per capita threshold (around USD 40,000), the slope of the line tends to flatten as GDP per capita increases, indicating that the impact of higher GDP per capita on life expectancy diminishes at higher income levels. This is a common pattern, where initial increases in income significantly improve life quality and health outcomes, but the marginal impact lessens at higher income levels.

However, there are a few outliers where countries with a relatively high GDP per capita (above USD 60,000) have a lower life expectancy compared to the general trend; which are Qatar and The U.S. (both have life expectancy at 76 years). These countries could be experiencing factors like inequality, or other social factors that hinder life expectancy despite high income levels.

From the Choropleth map

The Choropleth map highlights a clear pattern in 2021: Countries with higher GDP per capita tend to have higher life expectancy, represented by lighter shades on the map. This trend is most evident in regions like Western Europe, North America, and parts of Asia (e.g., Japan and Singapore), where economic prosperity💸 aligns with access to advanced healthcare, better nutrition, and improved living conditions. Conversely, darker shades dominate in lower-income regions, particularly in Sub-Saharan Africa, where limited resources, weaker healthcare systems, and socioeocnomic challenges contribute to shorter life expectancies. While the correlation is trong, it is again does not imply a causal relationship between the variables. Notable exceptions like Japan (with a high life expectancy despite a comparatively moderate GDP per capita) emphasize the influence of cultural and lifestyle factors alongside economic wealth.

Here is the interesting part:

Even countries with sky-high GDP per capita (think over USD 100,000) are not showing drastically higher life expectancy compared to those sitting around USD 60,000 to USD 80,000. This just goes to show that money is not everything-factors like efficient healthcare, strong social systems, and healthy lifestyles clearly play a massive role.

On the flip side, countries like Japan, with a more moderate GDP per capita, manage to achieve incredible life expectancy, proving that cultural habits, diet, and healthcare quality can outweigh pure economic wealth. So, while GDP per capita is definitely a solid predictor of life expectancy, it’s far from the whole story.

Step 6: Conclusion ✨#

Based on the analysis of 179 countries in 2021, people tend to live longer in countries with a high GDP per capita. You’ll never see a high-income country with a short life expectancy, nor a low-income country where people live exceptionally long lives. However, the story doesn’t end there—life expectancy can vary significantly even among countries with similar income levels. It all comes down to how the wealth is distributed and, more importantly, how it’s spent. Investments in healthcare, education, and social support make all the difference, proving that it’s not just about having money—it’s about using it wisely.

Proving the Hypothesis

The hypothesis proposed that countries with higher GDP per capita would have higher life expectancy due to better access to healthcare, nutrition, and living standards. The findings mostly support this hypothesis: higher GDP per capita generally correlates with longer life expectancy, confirming that wealth provides the foundation for better quality of life and healthcare infrastructure. High-income countries, on average, have better healthcare systems, nutrition, and living conditions, which all contribute to a longer lifespan.

However, this research also uncovered critical nuances. Wealth alone is insufficient to guarantee the highest life expectancy; the efficiency of healthcare, public policies, social cohesion, and lifestyle choices play major roles as well. Countries like Japan with moderate GDP per capita, but highly effective healthcare and cultural habits, consistently show high life expectancy—suggesting that the relationship is more complex than just wealth.

Answering the Research Question

“Is there a positive correlation between GDP per capita and life expectancy across countries in 2021, suggesting that economic prosperity leads to improved healthcare access, nutrition, and standards of living?”

The analysis indicates a positive correlation between GDP per capita and life expectancy, affirming that economic prosperity indeed plays a role in improving living conditions, access to quality healthcare, and nutrition. However, the correlation is not linear or absolute. The data show that, beyond a certain threshold of wealth, life expectancy improvements are influenced more by how the resources are utilized—investments in healthcare systems, societal factors, and lifestyle choices. Therefore, while higher GDP per capita generally indicates better life expectancy, true longevity also depends on a holistic approach that involves effective healthcare systems, quality of public services, and social well-being.

This deeper understanding helps to not only prove the hypothesis but also highlight that effective policy-making and health-focused investments are crucial for achieving the best outcomes in life expectancy, regardless of GDP levels.

Site URL#

Click here to see the publish section 😁